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Abstract : We present new estimators of the mean of a real valued random vari- 
able, based on PAC-Bayesian iterative truncation. We analyze the non-asymptotic mini- 
max properties of the deviations of estimators for distributions having either a bounded 
variance or a bounded kurtosis. It turns out that these minimax deviations are of the same 
order as the deviations of the empirical mean estimator of a Gaussian distribution. Never- 
theless, the empirical mean itself performs poorly at high confidence levels for the worst 
distribution with a given variance or kurtosis (which turns out to be heavy tailed). To 
obtain (nearly) minimax deviations in these broad class of distributions, it is necessary 
to use some more robust estimator, and we describe an iterated truncation scheme whose 
deviations are close to minimax. In order to calibrate the truncation and obtain explicit 
confidence intervals, it is necessary to dispose of a prior bound either on the variance or 
the kurtosis. When a prior bound on the kurtosis is available, we obtain as a by-product 
a new variance estimator with good large deviation properties. When no prior bound is 
available, it is still possible to use Lepski's approach to adapt to the unknown variance, 
although it is no more possible to obtain observable confidence intervals. 
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Introduction 

This paper is devoted to the estimation of the mean of a real random vari- 
able from an independent identically distributed sample. We will emphasize the 
following issues : 

• obtaining non asymptotic confidence intervals; 

• getting high confidence levels; 

• proving nearly minimax bounds in the class of distributions with a bounded 
variance and in the class of distributions with a bounded kurtosis. 
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To achieve these goals, we combine two kinds of tools: truncated estimates and 
PAC-Bayesian theorems ( [l9l[8l[l0l|51|2l[I]|). 

The general conclusion is that the empirical mean estimate behaves poorly at 
high confidence levels and that the worst case is reached for heavy tailed distribu- 
tions, as the proofs of the lower bounds show. 

This is the bad news. The good news is that, using iterated truncation schemes, 
it is possible to recover confidence intervals whose widths are close to the (opti- 
mal) width of the confidence interval of the empirical mean of a Gaussian dis- 
tribution, even at very high confidence levels. From a technical point of view, it 
is possible to build an estimator with an exponential tail even when the sample 
distribution has only a finite variance. This came out to us as a surprise while 
working on the more elaborate topic of regression estimation [|3l, and gave us the 
spur to work out the estimation of the mean in details, this simpler case lending 
itself to tighter computations. 

The weakest hypothesis we will consider is the existence of a finite variance. 
While it is possible to adapt a truncation scheme when the variance is unknown, 
using Lepski's approach, some more information is required to compute an ob- 
servable confidence interval. We study two situations: the case when the variance 
or some upper bound is known and the case when the kurtosis or some upper 
bound is known. In order to assess the quality of the results, we prove corre- 
sponding lower bounds for the best estimator of the worst distribution (following 
thus the minimax approach), and for the empirical mean estimate of the worst 
distribution, to assess the improvement brought by the PAC-Bayesian truncation 
scheme. We plot the numerical values of these upper and lower bounds for typical 
finite sample sizes to show the gap between them. 

Let us end this introduction with a few words in favour of high confidence 
levels. One reason to seek them is when the estimated quantity is critical from a 
safety or economical point of view. We will not elaborate on this. Another set- 
ting where high confidence levels are required is when lots of estimates are to be 
computed and compared in some statistical learning scenario. Let us imagine, for 
instance, that some parameter 6* e is to be tuned in order to optimize the answer 
of some loss function /e to some random input X. Let us consider a split sample 

scheme where two i.i.d. samples Xi, . . . , = Xf and X^+i, . . . , X^+n = ^l+i 
are used, one to build some estimators 6k{Xl) of argminQgQ^E[/e(X)] in subsets 
0fc, fc = 1, . . . , iiT of 6, and the other to test those estimators and keep hopefully 
the best. This is a very common model selection situation. One can think for in- 
stance of the choice of a basis to expand some regression function. If K is large, 
estimates of E[/g^j^j5^s)(Xs+i)] will be required for a lot of values of k. In order 
to keep safe from over-fitting, very high confidence levels will be required if the 
resulting confidence level is to be computed through a union bound (because no 
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special structure of the problem can be used to do better). Namely, a confidence 
level of 1 — e on the final result of the optimization on the test sample will require 
a confidence level of 1 — e/K for each mean estimate on the test sample. Even if e 
is not very small (like, say, 5 /lOO), e/ K may be very small. For instance, if 10 pa- 
rameters are to be selected among a set of 100, this gives K = (^^^ ) ~ 1.7 ■ 10^^. 
In practice some heuristic scheme will be used to compute only a limited number 
of estimators 6^, like adding parameters one at a time, choosing at each step the 
one with the best estimated performance increase (in our example, this requires to 
compute 1000 estimators instead of (\°°)). Nonetheless, asserting the quality of 
the resulting choice requires a union bound on the whole set of possible outcomes 
of the data driven heuristic, and therefore calls for very high confidence levels for 
each estimate of the mean performance E[/g^^^s)(Xs+i)] on the test set. 

Our study has several reasons to recommend itself as addressing the question 
of robust statistics: we prove distribution free bounds, truncation operates mainly 
on outliers and the lower bounds show that the worst behaviour of the empirical 
mean is achieved on heavy tailed distributions. Anyhow, our point of view is 
quite different from the classical setting of robust statistics, as epitomised by Peter 
Huber [6J. Indeed, our framework is not perturbative, — we do not assume the 
sample to be drawn from a mixture of known and unknown distributions — , and 
is not asymptotic either, since it is based on finite sample exponential inequalities 
for suitable auxiliary variables. Moreover Ruber's approach is a minimax study 
of the variance of estimators, whereas we analyze the minimax properties of their 
deviations. From a more practical point of view, the fact that the empirical mean 
is unstable is well known, and any statistical package provides tools to deal with 
outliers. It is interesting though that it shows in the equations, even when no 
sample contamination is assumed, by simply considering a minimax setting on a 
broad set of distributions including heavy tailed ones, and looking at the deviations 
of estimators, rather than focussing on their variance. 



1. Some TRUNCATED MEAN ESTIMATE 

Let {YiYl^^ be an i.i.d. sample drawn from some unknown probability dis- 
tribution P on the real line R equipped with the Borel a -algebra S. Let Y be 
independent from with the same marginal distribution P. Let m be the 

mean of Y and let v be its variance: 

E(y) = m and E[(r - mf] = v. 
Starting from some initial guess 6q about the value of the mean, prior to any 
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Some truncated mean estimate 



observation, let us consider the thresholded estimator 

1 

Oa{eo) = ^0 + — V T [a(Yi - 60)] , 



(1.1) 



i=l 



where the threshold function T is defined as 



T{x) = - log 



1 + a; + 
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Proposition 1.1 Assume that v < vq and \6q — m\ < 5q, where vq and 6q are 
known prior bounds. With probability at least 1 — 2e, 

_0 

3 



\0a{0o) - m\ < ^ + 

2 na 



;21og(e-i) 

Choosing a = \ , we get, with probability 1 — 2e, 
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Choosing a = \l independently ofe we get with probability at least 1 — 2e, 

nvo 



\9a{0o) - m\ 



nvoj 



Proposition 1.2 Assume that v < vq and \m — 9q\ < 5q, where vq and 5q are 
known prior bounds. With probability at least 1 — 2e, 



na 



Choosing a = \ l J. , we get 

y n{vo + 6^) 



9a{9o)-m\ < 



n 



Let us remark that the estimates proved here are valid for any confidence level 
1 — 2e. In particular, when 9a{9Q) is independent of e, it has a subexponential tail 
distribution, even in the case when P has not. The proofs are gathered in the last 
section of the paper. 

2. Iterated MEAN ESTIMATES 

The width of the confidence intervals proved in the previous section depends 
heavily on the value of the prior bound 6o. On the other hand, they lend themselves 
naturally to an iterated scheme. Here we will iterate Proposition 1.2 (page|5]), 
where the dependence of the bound on Sq is the best. 

Proposition 2.1 Let us assume that v < vq and |m — ^ol ^ ^o> where vq, 9q 
and 6o are known prior to observing the sample. Let Ui, i = 2, . . . , k be uniform 
real random variables in the interval (—1, +1), independent of each other and of 
everything else. Let us define 



V n 



2\og{e-,'] 



9i = 6*0,1 (6*0)5 
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Iterated mean estimates 



li = log(l + 3;^ ^ 
5,: = 



a 



n 



2[log(eri)+7,] 



With probability at least 1 — 2 e^, 



1=1 



Let us see how it behaves, choosing Xi — 1/10 and e\ — ■ ■ ■ — €k-i — 
(e — €k)/{k — 1) = e/10. The following two plots of Sk against e show that this 
iterated estimate permit very large values of the prior bound 5o, without any sub- 
stantial loss of accuracy (for a suitable number of iterations). 
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3. With no prior knowledge of the mean 



In the case when the prior bound 60 is not available, we can modify the iterated 
scheme of the previous section, using the empirical mean estimator as a first step. 

Proposition 3.1 Let us assume that v < vq, where vq is a known prior bound. 
Let Ui, i = 2, . . . ,k be uniform real random variables in the interval (—1, +1), 
independent of each other and of everything else. Let us define 



2nei ' 
1 " 



i=l 



n 
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With no prior knowledge of the mean 



2[log(e7i)+7, 



h + (1 + x,)Ht^ ' 

Qi = Oai {Oi-1 + Xi5i-iUi) , 



With probability at least 1 — 2 X^Ji^j^ e^, 

\m - 9k\ < h- 

So in this iterated scheme, we start from the empirical mean estimator and 
improve it gradually. We will show later that the confidence interval used here 
for the empirical mean is close to optimal in the worst case. What the next plot 
shows therefore, is that the iterated estimate brings a huge improvement for high 
confidence levels, allowing to stay close to the deviations of the empirical mean 
of a Gaussian distribution for confidence levels virtually as high as wished: this 
iterative truncation scheme behaves almost as the empirical mean estimate of a 
Gaussian distribution would behave, for any distribution with a known finite vari- 
ance, and beats the empirical mean in the worst case for confidence levels starting 
from around 94% for a sample of size 1000. 

n = 1000,^0 = 1, starting from the empirical mean 

o 




e, starting from 0.5, confidence level = 1 — 2e 
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4. Last step improvement 



In this section, we introduce a more elaborate estimate to perform tiie last step 
of the iteration. The result of the previous steps will be described as 9i, assumed 
to be some mean estimator satisfying with probability at least 1 — 2ei 



\m — 6i\ < Si. 



(4.1) 



Let us consider, for any 9q G R, the Gaussian distribution on the real line 
with variance (n/So;^)"^ and mean Oq, where a and P are positive parameters to 
be chosen later: 



Peoide) 



exp 



nf3a^ 



{e-eo)'' 



d9. 



This will serve to define some truncated mean estimate 

1 " r 



1=1 



l + a(F,-^o) + y(>^.-^o)' 



-|:iog|l + a«-W + ^(V;-()o)^+^}. (4.2) 



Proposition 4. 1 With probability at least 1 - e, for any Oq e R, 



— naMaiOo) < naiO^ — m) 



+ 



na 



{00 -mf + v 



Let us insist on the fact that this result holds with probability 1 — 2e uniformly 
with respect to 9o, which may therefore be a random variable — such as 9i — if 
required. 

Proposition 4.2 For any 9q eR, with probability at least 1 - e, 



naMa{Oo) < —na{9Q — m) + 



na 



{9o -mf + v 



+ ^-log(.). 
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Last step improvement 



Proposition 4.3 Let v < vq, where vq is some known prior bound. Let 9i be 
some estimator satisfying equation ( |4.1| ) with probability at least 1 — 2ei. Let us 
consider the estimator 



= inf <^ e>ei-5i:M, 



Let us define the ancillary function 

2x 



< 



X 



-, x<l/4, 



1 + VI -4x ~ l-2x' 
+00, otherwise. 



For any real positive constants and a such that 

Ana6i < [na^v + /3~^ + 2 log(e2 ^)] 



An 



which is the case at least when 
62 > exp< —n 



with probability at least 1 — 2ei — 2e2, 



1 ^ (na'^v + p-^) 

— adi — — 



2n 



\m — 6n\ < 



[1 + (3)[na% + /3-^ - 2\og{e2)] 



Considering a 



1 + I3)a 



An 



nvo 



, we deduce that as soon as 



26i < 



(3-^ + 2log{e^')]vo ( (l + /5)[r' + 21og(e2i 



n 



2n 



which is the case at least as soon as 

. 1 

£2 > exp 



nvo 



2(3 2{1 + (3)^61 + 8{l + (3)vo 
with probability at least 1 — 2ei — 2e2, 

2 



(4.3) 



(4.4) 



-1 



(4.5) 



(4.6) 



\m — 9n\ < 



nvo 



(1 + /3) V/5-i-21og(e2) 



/ (l + /3)[r^-21og(62 

' 2n 
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In the following plot, we took the same parameters as in the previous section, 
and substituted only the last step. It shows some improvement, especially for 
moderate confidence levels, but requires more involved computations. 

1000, = 1, first step = empirical mean, improved last step 
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page 10) breaks.) This is what we obtain when we decrease n to 300, 

300,^0 = 1, first step = empirical mean, improved last step 
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Mean estimate from a kurtosis prior bound 



When the sample size is thus decreased, the last step improvement works up to 
e ^ 10^^, after which the iterated estimate without last step improvement takes 
the lead, as shown on the previous plot. 



5. Mean estimate from a kurtosis prior bound 

Situations where the variance is unknown are likely to happen. It is possible 
to deal with them while making assumptions on the kurtosis. 

More precisely, let us introduce some uniform kurtosis coefficient, that we 
define as 

w.[{Y -ef] 



sup 



0eRE[(r-^)2] 



2 • 



Its relation to the classical centered kurtosis k = — ^ is given by the 

following lemma. 



Lemma 5.1 The two kurtosis coefficients defined above satisfy the inequalities 

9 



On the other hand, if and cp are the kurtosis and uniform kurtosis of the 
probability measure P 

sup Cp — Kp = 2, 

p 

where the supremum is taken over all probability measures on the real line, prov- 
ing that the previous bound is tight in the worst case. 

Anyhow, in the favourable case when the skewness is null, meaning that 
E[(y — m)^] = 0, the two coefficients are equal whenever k > 3, and more 
precisely 

(3 - 
5 — K 

K, K, > 3. 

Let us consider for any 9o eM, and 5 g)0, 1) the estimator of the mean 9aido) 
already considered in previous sections and defined by equation ( |1.1[ page|4]). Let 
us also consider the increasing mapping a G R+ ^ Qe,5{(^) defined as 



1 " r 1 r 

Qe^o^) = - V log 1 + - Of - 5 + - a{Y, - Of - 5 
n ^-^ I 2 L 



2 
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We will use the ancillary function h{a, y) 



Ay 



[l + y){l + Jl- 



Aay^ 



The next proposition is concerned with random confidence intervals, whose 
lengths are defined with the help of some estimator of the variance. More pre- 
cisely, we are going to iterate a process where we successively estimate v + {m — 

9)'^ and m. 



Proposition 5.2 Let us choose positive real constants Xi, and confidence levels 
€i, i — 1, • • • , 2k. Let Ui, i = 2, . . . ,2k be uniform real random variables in the 
interval (—1, +1), independent of each other and of everything else. Let us start 
with some prior guess 6i for m and let us define by induction the sequence of 
values 




Ci 



2iog(6r^) 

(c-l)n ' 



"2 logjl - h 

Si exp(-Ci; 



q2 = qiexp{x2CiU2), 
72 = log(l + a;2^), 



a2 = exp 



C2 = exp 



l + X2)Cil /2[log(e2-')+72] 



l + a;2)Cil /2g2[log(e2')+72] 



n 



O2 — ^02(^1)) 



72j-i = 72i-2 + log(l + a;2/_i) 

S2i-1 = 



2 log(e2,_i) +72i-i 



(c — l)n ' 

^2«-l — ^2i-2 + C2i-2X2i-lU2i-l, 

C2i-1 = -^logjl - /l (C- l)(52i_l } 
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Mean estimate from a kurtosis prior bound 



52i_iexp(-C2j-l) 



5 

t'2i-l,02i-l 



q2i = exp(a;2iC2i-i^2i), 
72i = 72i-i + log(l + 0:2/), 

1 + 2;2i)C2i-i 



Q;2i = exp 



C2i = exp 



2[log(e2-/)+72i] 



nq2i 



[1 + .T2i)C2i-l 



2g2i [log(e2/) + 72i] 



^2i — ^a2i(^2j-l), 



Le? M5 remark that 7, = log(l + Xj 821-1, and C,2i-i i — i, ■ ■ ■ ,k are 

i=2 

random and known prior to observing the sample. 

Let us assume that max 52i-i < — , . 

i=h-,k 2^C{C- 1) - (C - 1) 

With probability at least 1 — 2 Ylfti far any i = 1, . . . , k, 

\m -02i\ < C2i, 
|log(fe_i) - logft; + (m - ^2j-i)^] I < C2i-i- 



non 



As a consequence, on the same event of probability at least 1 — 2 X^j^^ e^, far any 
i = 2, . . . , /c, 

(m - 92i-lY < (1 + X2i^lfCii-2, 

fe-i < + (1 + :r2;-i)'CL2] exp(G,-i). 



C2i < exp[(l + X2i)C2i-l] 



C2 <exp[(l + X2)Ci]' 



^2 [t; + (1 + X2i-ini_2] [logjeg') + 72^] 
n 

1 2[v + {m - Oi)^] [log(e2 + 72] 
n ' 



exp(-C2i-i)g2i-i - (1 + X2i-i) (21-2 <v < exp(C2i-i)g2i-i- 

These equations allow to compute by induction a (non observable) deterministic 
bound for C2k, which is itself a random observable confidence interval half width 
for the estimate of the mean given by 62k- The last equation (better used with i — 
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k) shows that we get as a by-product an estimate of the variance with observable 
as well as theoretical confidence bounds. 



In the sequel, we will give lower and upper bounds for the worst case be- 
haviour of the empirical mean depending on the variance and the kurtosis. Note 
that here we do better, since we also estimate the variance and assume only a 
known prior bound on the kurtosis. Obtaining a similar observable confidence 
interval for the empirical mean would require to estimate the variance under a 
kurtosis bound, which is not something straightforward, as will also be discussed 
a little later. In the following plot, we chose a sample of size 2000, a size where 
things start to behave nicely under these assumptions. 

Although the lower and upper deviation bounds shown for the empirical mean 
estimator do not correspond to observable confidence intervals, we see that for 
confidence levels higher than 1 — 2 • 10~^, the observable confidence interval of 

our estimator outperforms the deviations of the empirical mean, up to confidence 
levels as high as 1 — 2 ■ lO^^''. We also plotted the upper estimate for the standard 
deviation (assumed to be equal to one). We took Xi = 0.5, i < 2k, X2k = 0.1 
and assumed that |m — | < lOy^. We kept the kurtosis to 3, the kurtosis of the 
Gaussian distribution. 

n = 2000, V = 1 (unknown), c = 3 (known) 
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Mean estimate from a kurtosis prior bound 



n = 2000, V = 1 (unknown), c = 6 (known) 
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This is the influence of the kurtosis on the bounds for a sample of size n = 5000: 
n = 5000, V = 1 (unknown), influence of c 
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~ empirical mean in the Gaussian case 

'" iterated mean/variance estimator, 2 it- 
erations, c = 3 

'" empirical mean estimator, worst case 
lower bound, c = 3 
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2 iterations, c = 6 

iterations, c = 12 
iterations, c = 18 
2 iterations, c = 24 



J L 



10-13 10-12 10-11 10-1" 



J I I L 



J L 



10- 



10-s 10-7 10-0 10-= 10-' 10--' 



10- 



e, starting from 0.5, confidence level — 1 — 2e 



6. Adapting to an unknown variance when the kurtosis 
unknown or even infinite 



IS 



In this section, we will point out that Lepski's renowned adaptation method 
Q can be put to good use when nothing is known, neither the variance (still as- 
sumed to be finite) nor the kurtosis (not even assumed to be finite !). Of course, 
under so uncertain, (but unfortunately so frequent) circumstances, it is not possi- 
ble to provide any observable confidence level. Nevertheless, it is still possible 
to adapt to the variance and to give deviation bounds depending on the unknown 
variance. Here, a clear distinction should be made between adapting to the vari- 
ance and estimating the variance : estimating the variance at any predictable rate 
is impossible in this context where we do not assume any higher moment to be 
bounded. 

The idea of Lepski's method is powerful and simple : consider a sequence of 
confidence intervals obtained by assuming a variance bound Vq to take a range 
of possible values and pick up as an estimator the middle of the smallest interval 
intersecting all the larger ones. For this to be legitimate, we need all the confidence 
regions for which the variance bound is valid to hold together, which is performed 
using a union bound. 
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Adapting to an unknown variance 



Let us describe this idea more precisely. Let 9{vo) be some estimator of the 
mean depending on some assumed variance bound vq, as the ones described in 
the beginning of this paper. Let S{vo, e) G R+ U {+cxo} be the corresponding 
confidence bound : namely let us assume that with probability at least 1 — 2e, 

\m - divo)\ < 5{vo,e). 

Presumably, except for distributions with bounded support, 5{vq, 0) = +00. 

Let v e MJ',_(R+) be some coding atomic sub-probability measure on the 
positive real line, which will serve to take a union bound on a (countable) set of 
possible values of vq. 

We can choose for instance for 1/ the following coding distribution : expressing 
Vq by comparison with some reference value V 

d 

vo = vr J2 Cfc2-^ seZ,defi, (c,)to e {0, co = = 1, 

fe=0 

we set u(vo) = + 2)(|s| +3)(d+ l){d + 2)2'^-^] ~\ and otherwise we set 
i/(i>o) = 0. It is easy to see that this defines a subprobability distribution on R_|_ 
(supported by dyadic numbers scaled by the factor V). It is clear that, as far as 
possible, the reference value V should be chosen as close as possible to the true 
variance v. 

Let us consider for any vq such that S{vo, ei/(vo)) < +00 the confidence inter- 
val 

I{vo) ^e{vo) + 5[vo,eu{vo)] x (-1,1). 

Let us put I{vo) = R when 5{vo, ez/(fo)) = +00. 

Let us consider the non-decreasing family of closed intervals 

J{vi) = f]^l{vo) ■.vo>ViY ^^ieR+. 

A union bound shows immediately that with probability at least 1 — 2e, m e J{v), 
implying as a consequence that J{v) ^ 0. 

Proposition 6.1 Since vi 1— > J{vi) is a non decreasing family of closed inter- 
vals, the intersection 

f||j(^;i) eR+,J(^;i)^0} 

is a non empty closed interval, and we can therefore pick up an adaptive estimator 
9 belonging to it, choosing for instance the middle of this interval. 
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With probability at least 1 — 2e, m G J{v), which implies that J{y) ^ 0, and 
therefore that 6 G J{y). 

Thus with probability at least 1 — 2e 

\m-e\< \ J{v)\ <2 inf 5{vo,ev{vo)). 

vo>v 

If the confidence bound 5(f o, e) is homogeneous, in the sense that 

5{vo,e) = 5{l,e)y/v^, 



as it is the case in Proposition \3.1\ (page [7|) and Proposition \4.3\ (page [70l) when 



used in conjunction with Proposition \3.1\ then with probability at least 1 — 2e, 

|m — ^^1 < 2 inf 5(1, eh'{vo))y/vo. 



vo>v 

Since usually e i— 5(1, e) is quite flat in the high confidence region, as shown on 
previous plots, we see that, in the high confidence region we are mostly interested 
in in this paper, the order of magnitude of the adaptive confidence bound is not 
much more than twice the value S{v, e) of the confidence bound we would have 
obtained for the estimator 9(v) which we could have used had we known the exact 
value of the variance beforehand. 



7. Worst case empirical mean deviations for a given kurtosis 

VALUE 

In the previous sections, we studied truncation techniques suited to various 
prior hypotheses on the sample distribution. It is interesting to compare them to 
the performance of the empirical mean estimator. This section is devoted to upper 
bounds, whereas the next will study corresponding lower bounds. 

When the variance is known and nothing else, it is easy to see, using Cheby- 
shev's inequality for the second moment that the empirical mean 

n 



1= 



P( |M-m| > W— ) < 2e. (7.1) 



is such that 

p( IM - ml > J 

2en 

The behaviour of the empirical mean for a given kurtosis is not so straightfor- 
ward. The following bound uses a truncation argument, allowing to study sepa- 
rately the behaviour of large and rare values. It is to our knowledge a new result. 
We will show later in this paper that its leading term is essentially tight (up to the 
(3/2)^/"^ multiplicative constant due to the union bound argument). 



Olivier Catoni 



19 



Lower bounds 



Proposition 7.1 For any probability distribution whose kurtosis is not greater 
than K, the empirical mean M is such that with probability at least 1 — 2e, 



\M-m\ 21og(fe-i)v/^ /21og(fe-i 



5n y n 

JiK_\ f 3'{n-l)log(le-'y^ 12y21og(|6-^)3/2v/^ ^ 
^ ' 2en3 j 2500n2 ^ 25n3/2 

Let us also stress here the fact that estimating the variance under a kurtosis 
bound, using the empirical estimator 

n 

n ^-^ 

1=1 

of the moment of order two is likely to be unsuccessful at high confidence levels. 
Indeed, computing the quadratic mean 

we can only conclude, using Chebyshev's inequality, that with probability at least 

1 -2e 



c- 1 

a bound which breaks down at level of confidence e = , and which we do 

2n 

not suspect to be substantially improvable in the worst case. In contrast to this. 
Proposition |5 . 2| (page \T3\ provides a variance estimator at high confidence levels. 



8. Lower bounds 

8.1. Lower bound for Gaussian distributions. This lower bound is 
well known. We recall it here for the sake of completeness. 

The empirical mean cannot be improved in the Gaussian case in the following 
precise sense. 

Proposition 8.1 For any estimator of the mean 6 : R" — > R, any variance 
value V > 0, and any deviation level rj > 0, there is some Gaussian measure 
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Lower bound for Gaussian distributions 



'N{m,v) (with variance v and mean m) such that the i.i.d. sample of length n 
drawn from this distribution is such that 

F{e>m + r})>F{M>m + rj) or ¥{6 < m - r)) > F{M < m - rj) , 

1 " 

where M — — is the empirical mean, 

n ^-^ 

i=l 

This means that any distribution free symmetric confidence interval based on 
the (supposedly known) value of the variance has to include the confidence in- 
terval for the empirical mean of a Gaussian distribution, whose length is exactly 
known and equal to the properly scaled quantile of the Gaussian measure. 

Let us state this more precisely. With the notations of the previous proposition 



P(M > m + 77) = P(M < m - 77) 



G 



n 

ri, +00 

V 



where G is the standard normal measure and F its distribution function. 
The upper bounds proved in this paper can be decomposed into 

F(9>m + r])<e and F(9<m-r))<e, 

although we preferred for simplicity to state them in the slightly weaker form 
P(|^-m| >ri)< 2e. 

As the Gaussian shift model made of Gaussian distributions with a given vari- 
ance and varying means, is included in all the models we consider to state bounds, 
we necessarily should have according to the previous proposition 

e>l-F(yf„), 

which can be also written as 



Therefore some visualisation of the quality of our bounds can be obtained by 

plotting e I— > 77 against e 1-^ \ —F~^(l — e), as we did in the previous sections. 

V n 
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Lower bounds 



8.2. Worst performance of the empirical mean for a given vari- 
ance. Another way to measure the quality of the bound is to compare it to the 
empirical mean outside from the Gaussian shift model, where we have seen that 
the deviations of the empirical mean are minimax at any confidence level. This is 
done in the following proposition. 

Proposition 8.2 For any value of the variance v, any deviation level 77 > 0, 
there is some distribution with variance v and mean such that the i.i.d. sample 
of size n drawn from it satisfies 

n-l 



V ( 1 

P(M >r]) = P(M <-v)>- 



2nrf 

Thus, as soon as e < (2e)"^, with probability at least 2e, 



n-l 

V ( _ 2ee^ 2 



\M -m\> J 1 - 

2ne V n 



Let us remark that this bound is pretty tight, as shown in the next plot, since, 
according to equation ( |7.1[ page [19]) with probability at least 1 — 2e, 



M -m < ., 

2ne 



n = 1000,^0 = 1 



1 1 1 — I — I — I I I I — 

empirical mean in the Gaussian case 



fi^ 1 1 1 — I — I — I I I I 

\^ empirical mean upper bound for the 

\^ worst distribution of unit variance 

\ empirical mean lower bound for the 

\. worst distribution of unit variance 



V, 



J I I I I I I I 



J I I I I I I I 



e, starting from 0.5, confidence level = 1 — 2e 
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Worst performance of the empirical mean for a given kurtosis 



8.3. Worst performance of the empirical mean for a given kurto- 
sis. 

Proposition 8.3 For any c > \ + \/n, and any e < (4e)~^, there is a proba- 
bility measure on the real line, with uniform kurtosis equal to c and unit variance, 
such that with probability at least 2t, 

1/4 / 4ee\(""^)/^ 



M - m > 



c- 1 



n J 



Let us plot this lower bound as well as the corresponding upper bound given 
by Proposition |7 . 1 1 (page [20]) , for a sample of size n = 2000 and a kurtosis c = 6. 
The space between the two curves is of moderate size, showing that we got the 
order of magnitude right in these bounds. 

n = 2000,^70 = l,c = 6 



•T3 



•T3 



1 r\ r 

\ 



1 r 



1 1 1 r 



empirical mean in the Gaussian case 



\^ \ 

\ \ empirical mean upper bound for the worst distri- 

\ bution of unit variance and given kurtosis 

\ \ empirical mean lower bound for the worst distri- 

\ \ bution of unit variance and given kurtosis 

\ 

\ \ 

\ \ 



J. 



J. 



10"" 10-'-° 10"'^ 10"* 10-'' 10"'' 10^'' 10"^ 10-^ 10"^ 10" 

e, starting from 0.5, confidence level = 1 — 2e 

9. Proofs 



9.1. Proof of Proposition | 1.1 [ (page [4]). Let us start with some bounds for 
the map x t— > log(l + x + ^) . 



Lemma 9.1 The map x log(l + x + y) satisfies for any x G R, 



x^ / x^ 

< loff 1 + X H 

38 - 2 



S 4 

X X 

- X H < — . 

6-6 
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Proofs 



Proof. Let us consider for some positive real parameter a the function 

/(x) = log(l + x+^)-x+^-^. 
We can study its sign through its derivative 

, 1 + x x'^ ax^ x'^[x+y) ax^ 

^ ^""^ ^ 1 + x+f " ^ + T " ^ ^ 2{l + x + f) ~ T 



x^ 



(l + ^-a-ax-^) _ x^[{a-l) + {a-l)x+^] 



2{l + x + f) 2{l + x+f) 

When (a — — 2a{a — 1) < 0, f'{x) has the same sign as —x, showing that 
supj^ / = 0, since /(O) = 0. This condition can also be written as — a — | > 0, 

and is fulfilled when a — Thus 

, / (1 + V2)X^ X^ 

log(l + a; + ^) -x+-< ^ < -, xeR. 

Let us proceed to the lower bound now. Consider the same computations as above, 
but with a negative parameter a. In this case, under the same discriminant condi- 
tion, f'{x) has the same sign as x, showing that inff^ f — 0. For the lower bound, 
we can thus take a — proving that 

□ 

We will also need the following property of the truncated exponential function 

1 x^ , , 

-<l + x+ — ~ exp[x): 

-log(^l-a; + y^ =lQs( ^|^t^ ) < log , x e R, (9.1) 
so that 

- log (^1 - X + ^) T_{x) < T{x) < T^{x) log (^1 + X + ^ 
Accordingly 

1 " 

OaiOo) < ^0 + — V r+ [a{Yi - e^)] . 

1=1 
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Proof of Proposition 1.1 (page 4 



We can then compute the exponential moment 

E 



Lxp\j2T4a{Y,-eo)]\\ 

i=i ^ ^ 



n\og\ 1 + a(m 



exp 



From the exponential Chebyshev inequality — ^ 7~tt ^ exp(—r7), considering 

E[exp(X)J 

e = exp(— r/), we deduce that with probability at least 1 — e, 



+ ^y2^og(l + a{m-9o) + ^[v + {m-9o)']]+^-^^^. (9.2) 



i=l 



na 



Let us now remark that for any x G R, any y G R+, according to Lemma 9.1 



(page 23 1 



X 



log ( 1 + X + y + ?/ 



log( 1 + X+ — I +log( 1 + 

y 



< X 



x^ x"^ V f 

— H \ ^ 1 -X H 

6 6 l + ^V 2 



4 

3^ 3^ 



2 

X y 



Thus 



<x- — + — + y-xy + 

DO 2 



av log(e ^) 

H h ^ 

2 na 

a^(m — ^o)^ a^(m — 6'o)^ a^f / „ a(m — 6qY 

m — Oq 



6 



6 



In the same way, considering 6*0 — Yi instead of Yi — Oq and using the symmetry 
of T(x), we get with probability at least 1 — e 
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Proofs 



ea{e^) > m - — - — 

2 na 

a^im — Oq)'^ a^{m — 6qY o?v ( aim — Q^''^ 

— Tfi — + 

6 6 2 ' " 

Therefore with probability at least 1 — 2e, 

^^0 -m < ^ + ^^ 

2 na 

o?\m — Qn^ , , „ a^lm — 6'o|f / a\m — Qn\ 
+ ^ — — (l + a|m-go|) + ' ^' ( 1+ ' ^' 



6 



+ 



av lofffe 

< h 

2 



(l + a|m-^o|)( ^ :r^ + v 



'21og(e-i' 

Specifically, when a = \ l -, with vq>v. 



\OaiOo)-m\ < 



2vo log(e 



n 



H ^ (m-6'o) +3t;o 

3nfn 



1 + |m - 6*01 



'21og(62^ 
nvo 



If we prefer to keep the estimator OaiOo) independent from the confidence level 

1 — e, we can choose a = \ and obtain for this value and with probability at 

V nvo 

least 1 — 2e, 



\0a{0o) -m\< 



[lH-.og,.-)]y| 



1 + |m - 6^0 



nvo 



1.2 



(PAGEpl). If |m — 1 is already small, or if 



9.2. Proof of Proposition 

you are aiming at an iterative scheme, you can be content with the inequality 

2 ^ na 

which holds with probability at least 1 — 2e. This is a consequence of Equation 



(9.2 page 25 1 and the coarse inequality log(l + x) < x. 
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Proof of Proposition 2.1 (paged 



9.3. Proof of Proposition 



2.1 



(page 5]). We are going here to iterate the 
use of Proposition 1.2 (page [5]). Applying it once, to start with, we get that with 
probability at least 1 — 2ei 



\9i — m\ < Si- 



Let 



9i + 51X2U2 when \9i — m| < 5i, 
m + 61X2U2, otherwise. 

We are going to use some PAC-Bayesian theorem to overcome the fact that the 
sequence of estimators 9i is computed on the same sample. 

Let us consider the prior distribution tti defined as the uniform probability 
measure on the interval m + (1 + X2)5i x (—1, +1). Let pi be the conditional 
distribution of 6^2 knowing the sample. From the definition of 92, we see that for 
any value of the sample the support of pi is included in the support of tti, 

and therefore that pi is absolutely continuous with respect to tti, with density 



dpi 
d-Ki 



1 + X2 , 



Pi almost surely. 
Let us define the family of random variables 



X{9) = J] T+ [a2{Y, -e)\-n\ogil + a2{m-9) + '^[v + {m- 9f] ) . 
Integrating with respect to pi, and using Fubini's theorem we get 



dp 



E|/pi(d^)exp[X(^) -log(l + X2^ 

Pi{d9)l (^^(e) > 0^ exp|x(^) - log 

-eS^J 7i^{d9)l(^^i9) > 0^ exp[X{9)] 
< E|/7ri(d^)exp[X(^)]| = /7ri(^)E|exp [X(^)] 



1. 



We can now use the fact that Ppi is the joint distribution of the sample and of 6*2 
and Chebyshev's exponential inequality, to prove with probability at least 1 — 62 
that 

X(^2) <log(e2^) + log(l + a;2'). 
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Proofs 



As d^^{d2)<d2 + 



na2 



— log ( 1 + a2(m - 62) + '^\v + {m- 62^] 



<m + ^[v + {m--e2f] + ^ 
1 '- -' na2 



we deduce that with probabihty at least 1 — 62, 



2) - < ^ k + - 6'2) H < 52- 

2 na.2 



We can prove in the same way that with probability at least 1 — €2, 

vfi-Q 0.^(^2) <h. 
We deduce that with probability at least 1 — 2e2, 

\m-Q^,{Q2)\<^2. 

Moreover, we see from the definition of 6*2 that with probability at least 1 — 2ei, 
^2 = ^02(^2)5 therefore with probability at least 1 — 2{t\ + £2), 

|m - Q2\ < 

The induction carries on in the same way. Assuming that with probability at 
least 1 — 2 X]i=i \^ ~ dk-i\ < ^k-i, we deduce that with probability at least 

1 -'^Y!l=l^i^ \^~dk\ < Sk- 



9.4. Proof OF Proposition 



3.1 



(page vj) . The proof is the same as the pre- 
vious one, except for the first step, which is a consequence of the Chebyshev 
inequality, applied to the second moment of the empirical mean: 

/- \ Er(^i-m)21 

P(|^i-m| >5ij < ' ^ <2ei. 



4.1 



9.5. Proof of Proposition 
functions can serve to pull the integration with respect to pe,, out of the logarithm. 



(page 9b. Jensen's inequality for convex 



Using moreover Equation (9.1 page 24), we get the following chain of inequali- 
ties: 
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Proof of Proposition 4. 1 (page[9|) 



naMM < -/ P9oid9) log 



i=l 

n 

i=l 



C\ 9 

l-a(0-F,) + y(^->^O 



To proceed, let us consider the empirical process 



iy(0) = J]log 

It satisfies 
E|exp[Vr(^)] j = E<j JJ 



i + a(e-r,) + y(e-y;)' 



i=l I- 



1 + a(6' - m) + 



(^-m)VE[(y-m)2] 



Thus if we put 



w{Q) = n log< 1 + a(Q — m) + 



a 



— m) + f 



we see that 



(with equality when E[(F — m)^] = f). We can then follow the usual PAC- 
Bayesian route, choosing as reference measure pm- This consists in the inequali- 
ties 

Eiexp sup \pe,Xde)^{e)-w{e)\-%{pe,,pm 



< E<j / p^[dQ) exp w{e) - w{e) 
= /p^(rf^)E|exp 



W{9)-w{9) 



where we have used Fubini's theorem and the convex inequality 
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Proofs 



sup / p{d9) [Wi9) - 10(6)] - Xip, pm) 

= log{/p^(rf^)exp[W^(^) - 

(See ['4', page 159] for a proof.) 

From the exponential Chebyshev inequality P [X > r^] < E [exp(X — r^)] , it 
follows that with probability at least 1 — e, for any 6^0 G R, 

jpe,mW{e) < jpe,mnj{e) + %{pe,,p^)~\og{e). 
We can then remark that X{peo,Pm) = ^^^(^o ~ "^)^ and that 

Jpeo{de)w{e) < nJpeMO)\^(^i0 " ^) + y [(^ - m)' + v]^ 

9 

= na{9o - m) + [(^o - + v] 
to conclude that with probability at least 1 — e, for any 9o E R, 



1 

2(3' 



Jpe,{MW{9) <na{9o-m) + 



na 



9o - my + V 



As we have already established that —naMa{9Q) < J p0g{d9)W{9), this com- 
pletes the proof of the proposition. 



9.6. Proof OF Proposition 



4.2 



E 



|exp[naMa(6'o 



(PAGE[9j) . It is straightforward to realize that 



a . „ 11 

2^J ■ 



<<l- a{9o - m) + 



The result then follows as in the previous proof from the exponential Chebyshev 
inequality. 



4.3 



(PAGE 10). We will need the following ele- 



9.7. Proof OF Proposition 
mentary lemma. 



Lemma 9 . 2 For any positive real constants a and c such that Aac < 1, 

[x E R : X > ax"^ + c} 
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Proof of Proposition 4.3 (page 10) 



2c 



1 + VI - 4ac 



1 + VI -4ac' 



2a 



D 



c 1 — 2ac 



1 — 2ac a 



Using Proposition |4.1| (page |9]) and the previous lemma, we see that with proba- 
bility 1 — 62 



2 ( {1 + (3)[na^v + f3-' - 2\og{e2 

m < &a + — r,\ V 



or m> 6a + 



{l + f3)a\ An 



2na 



X LP 



'l + P)[na\ + p-^ + 2\og{e:,')\ 
An 



^ 2 {I +(5)[na''v + (5-^ -2\og{e2)] 

- + 2n 



Let us make sure that the second condition cannot be fulfilled when \6i—m\ < 5i, 
assuming that 



62 > exp< — n 



1 ^ (na'^v + (3-^) 
— adi 



2n 



or more accurately that 

Ana5i < [na^v + + 21og(e2 ^)] 

X If 



[I + (3) [na'^v + 13-^ + 2\og{q^)\ 



An 



In this case, with probability at least 1 — 62, either \6i — m\ > 5i ox 



2 I [I + (3)\na^v + -2\og{e 



On the other hand, let us consider 



An 



^ , 2 (na^v ^ (3-^ -2\og{e2) 
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Proofs 



From Proposition 4.2 (page [9]), with probability at least 1 — £2, M{Oq) < 0, and 
therefore 

^ 2 f{l + (3)[na^v + (3-'-2log{e2 

"a ^ "0 + — n\ ^ 



Thus with probability at least 1 — 2e2, either \9i — m\ > Sior 

2 f {1 + (3) \na^v + (3-^ - 2 \og{e2 
\0n — m\ < — — {p ' 



4n 



This proves the first part of the proposition. The consequences drawn from spe- 
cial choices of a are obvious, except for the last condition which may require 

'r^-21og(62)^'/' 



some verification: when a 



nv 



, putting 7 = /3 ^ - 2 log(e2) 



condition ( |4.5[ page [TO]) becomes 



nv 



1/2 



(1 + /?) V7 
This can also be written as 

ii+/3) , (i+m 



^ (i+/g)7 



n 



n 



-7 + 



nv 



7-l<0, 



which is a second order inequality in Considering that > 0, its solution 
is 

2 



'nv s nv n 
To simplify formulas, we can remark that this inequality is satisfied when 



that is when 



t2 > exp 



n 



nv 



2p 2{l + (3y6t + 8{l + (3)v 



5.1 



(PAGE 12). Using the fact that the L2 norm of a 



9.8. Proof of Lemma 
sum is less than the sum of the norms, and the definition of the kurtosis, we get 
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Proof of Lemma 5 . 1 (page 1 2 ) 



E(F^) =E|[(F-m)^ + m(2F-m)]^} 

< ||E[(r-m)^]'^' + |m|E[(2r-m)2]'^'| 

< |K^/2E[(r - mf] + |m|E{ [2{Y - m) + m]^] 



2 

2l 1/2 



2\l/2 



2 



Introducing ?/ = ^^^^^ , this gives ^/^^^ ^ < f - + 1/^^^(4 - 3?/)^/^ 



E(r2)" E(F2)2 ^ 

Let us consider the function / : (0, 1) i— > R defined as 

It reaches its maximum at point x satisfying f'{x) = 0, that is 

-k'/' + \x-^l\^ - 3x)i/2 _ 3^1/2(4 _ 3^)-i/2 ^ Q_ 

4 

Therefore x satisfies Ka;(4 — 3x) = (2 — 3x)^ or 3a;^ — 4a; H = 0. Thus 

/t^ ~1~ 3 

2/ r^^\ 

x = - 1 — W , and 

3V V« + 3/' 



ye(o,i) 
This proves that 



sup /(j/) = ^1/2 / 1 + 2 /^t \ _^2./^_^ = ^^/^T^+ly;:. 



3 3V'« + 3/ V'^ + 3 3 3 



_^<_(y;.+ 2V^) =-(5 + 



2^2 























12 








h4- 









6 



/t + 2. 



The same is of course true for Y — Q for any shift as a mere change of notations 
shows, proving the first assertion of the lemma. 

Consider now for the lower bound the Bernoulli distribution with parameter p. 
In this case 

E[(F - mf \ = p{l - pf + (1 - p)/ = p(l - p), 
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Proofs 



E [{Y - mY] = p(l - p)^ + (1 - p)/ = p(l - - 3p + 



thus 



1 - 3p + 3p2 p 

= 7Z ^ = P —2 



p{l-p) 



1 — p 



E(Y^) 

Moreover = p < c, and thus 



2^2 



c- K > 2 



p 



1 — p 



While p tends to zero, this proves that sup cp — /tp > 2, and therefore, due to 
the already proved upper bound, that sup cp — Kp = 2. 

When the skewness is null, that is when E[(F — m)^] = 0, we can write, 
assuming without loss of generality that m = 0, 

E[(y + e)^] = E(F^) + Qe^E{Y'') + 

E[(r + ef] - + 6^2|E[(r + ef] - ^^j + e\ 



K 



Thus, introducing y 



E[(y+e): 



we see that 



E\(Y + 9y] 2 2 

c = sup — —2 = sup k(1 - y) + 6y{l - y) + y 

e&RE[{Y + ey] ymi) 

(3-k)2 

sup K — 2{k — 3)y + [K — 5)y =\ [o — k) 

yG{o,i) 



K> 3. 



9.9. Proof of Proposition 5.2 (page 



13 



). We are going to build a non 
observable variant of the construction made in the proposition, for which the con- 
clusions of the proposition are always fulfilled because we enforced them. 

Remember that the sequences 6i, % and (21-1 take non random values, and let 
us define 



9i = 9i, 

f 5iexp(-Ci; 



when thus defined satisfies 

|log(gi) - \og\v + (m - I < Ci, 
otherwise. 
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Q2 = giexp(x2Cit^2) 



a2 



C2 = exp 



2^2 [log(e2 ^) + 72 



n 



(^1)) when 6*2 thus defined satisfies 
\m - < C2, 
m, otherwise, 



6*24-1 — d2i~2 + C2i-2^2j-l 

f (52i-iexp(-C2i-i^ 



. , X o when g2i-i thus defined satisfies 

Ql,,_iA,_J-(c-l)52Vl] 

|log(fe-l) - ^^A^ + (m - ^2i-l)^] I < C2i-1, 

t> + (m — 6'2i-i)^, otherwise, 
= fe-i exp(x2iC2i-it/2i), 

(1 + X2i)C,2i-\ 



a2i = exp 



C2i = exp 



'2[log(e2-/)+72.: 



^g2i 



(1 + X2i)C2i-l 



'2fe [log(e2/) + 72*] 



77, 



6'«2i (^2i-i), when 621 thus defined satisfies 

72i = \ \m- e2i\ < C2i- 



otherwise. 



By construction, these modified quantities are such that for any i = 1, 

\m - 92i\ < C2i, 

|log(fe-l) - log[^ + (m - ^2i-l)^] I < C2i-1- 

Let us defined "the modified sequence" 



S2j-1 
S2j 



2i- 



2i- 



-i)t2'Mfe)j-:;| = ^2,-2u{02i.-i}, 

^i)l2' = ^2,-1 U {log(g2,)}- 



, k. 



Olivier Catoni 



35 



Proofs 



The first step of the proof will be to prove by induction on j the following lemma. 

Lemma 9.3 There exists some prior distribution nj on the modified sequence 
Sj (that is some non random probability measure on such that the joint 

conditional distribution pj of the modified sequence Sj knowing the sample (^)"=i 

is such that log f -f^ ] < 7,. 

Proof. Indeed, assuming that this is true for 2j — 2, we build 7r2j_i and 
from vr2j_2 by deciding that vr2j_2 is the marginal of n2j-i on 5*2^-2 and that 7i2j-i 
is the marginal of on 5*2^-1. We complete the definition of n2j-i by defining 
the conditional distribution of ^2j-i knowing 5'2j-2 under 712 j-i as the uniform 
probability distribution on the interval 

m + (1 + X2j-i)C2j-2 X +!)• 

Similarly we complete the definition of 7r2j by defining the conditional distribution 
of \og{q2j) knowing S2j-2 U {^2j-i} as the uniform probability distribution on the 
interval 

log[^; + (m - 92j-if]+{l + X2j)C2j-i x (-1, +1). 

As the conditional distribution of ^2j-i and log(^2j) knowing S2j-2 and the sam- 
ple (5^i)f=i is the product of the uniform probability measure on the interval 

92j-2 + C2j-2X2j-l X 

and the uniform probability measure on the interval 

logfei-l) + C2j-lX2j X (-1, +1), 

it is readily seen that = on the support of p2i-i(-|'S'2,-2)- 

d7r2j-l{-\S2j-2) 

Using the induction hypothesis we deduce that 

dp2j-l dp2j-2 dp2j-l{-\S2j-2) , , , _i X , . 

-rr— = -nr— x , g < exp(72,-2)(l + a^s/.i) = exp(72j-i). 

a'K2j-l a7r2j-2 a'K2j-l{-\02j-2> 

We deduce in the same way that 

dp2j dp2j-l dp2j{-\S2j-l) , . 

A = ^ , ; L < exp(72j). 

Moreover the first step is easy to prove, taking for tti the uniform probability 
measure on the interval 

log[v + (m - 9if] + (1X2 X (-1, +1). 

This achieves to prove the lemma by induction. □ Let us now proceed with a 
second lemma. 
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Lemma 9.4 With probability at least 1 — 2e2j-i, 

S2i-iexp{-(2i-i] 



Proof. Let us remark that 



< exp [n(7(^)] , 



where 

g{9) =a[v + {9-mf] 



(c-l)a% 
^2^-l + + 



m 



,212 



If ^2 

a[v + {9 - mf] - 62i-i 



Integrating the previous exponential moment with respect to p2i-i, and taking 
expectations with respect to the distribution of the sample we get, for any 

measurable mapping S'2i-i ^— a(5'2j-i) to be chosen afterward, 



E|jp2i-i(rf52i-i)exp^nQg^^_^ 5^^_^(a) - ng{92^-l) - 72*-i 

'C?/02i- 



<E{/p2.-i(d52._0l(^^>0 



X exp 



< E|/7r2i_i(c/^2i-i)exp^nQe2^_^ ,52^_^(a) - ng{92i-i) 
= /7r2i_i(c/S2i_i)E{exp nQQ^^_^ s^^_^{a) - ng(92i-i 



< 1. 



(Let us remember that 9 21^1 is the last component of 5*24-1. We made some de- 
pendences explicit, but not all of them, and more specifically the dependence of a 
here has been kept hidden.) 

Using Chebyshev's exponential inequality, we deduce from this moment in- 
equality that, for any measurable mapping ^^2i-i ^ tt(6'2i-i), (that is for any 
choice of a which may depend on the value of 92i-i), with probability at least 

1 — ^2i-l, 



Qe,._i,5,,_,[«(^2.-l)] <9{92^-l] 



72i_i + log(e2i_i 



n 
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Proofs 



It is useful at this point to realize that the mapping a ^— Qe^i^o) is increasing 
for any 6* G R and 5 g)0, 1), as its derivative shows 



1 " 



:ii + a{Y,-ef + \[{Y, 



> 0. 



In order to choose a (which is allowed to depend on 621-1), let us introduce tem- 
porarily y = a\v + {92i~i — ra f] -52^-1. We can rephrase what we just proved 
saying that with probability at least 1 — 62,-1, 

y + 52i-i 



V + (92i-i - m) 



, (c-l)(y + ^2.-i)^ , , (c- 1)4-1 

^+ 2 +y+ — 2 — 



cy^ 



+ [(C- 1)52.-1 + I]l/+(C-I)4_i 



Let us choose y such that the argument of Q-^ ^ in this inequality is equal to 
— (c — 1)5|._^. This requires that y should satisfy 



cy" 



+ [(c - 1)52,-1 + 1] 2/ + 2(c - l)4_i = 0. 



This has (negative) real roots when 

^2i-l < 



2v/c(c-l)-(c-l) 
and it is elementary to check that the largest of these negative roots is 

y = -62i-ih[-^, (c - l)52i-i] = -[1 - exp(-2C2i-i)]52i-i- 
Thus with probability at least 1 — 62,-1, 
exp(-2C2i-i)52,-i 



V + {d2i-i - m) 



(9.3) 



To get the reverse inequality, we may notice, due to Equation ( |9.1[ page [24]) that 

1 " r 1 r 1 

Consequently 



i=l 



E 



,5 



< exp[ri^(^)], 
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where 



g{9) = exp|n5 — na[v + (9 — m)^] 
(c- 



+ 



[v + {9 - mfY + l\a[v + {9 - mf] -6 



Thus, integrating with respect to 7121-1 as previously, we deduce that with proba- 
bility at least 1 — e2j-i. 



-g[92i-i} 



72i_i + log(e2/_i' 



fc-1 



n 



2 S^i-i<Qe,^.uS2,-A^)^ 



where the choice of a may depend on 6'2i-i . Choosing then a 



we get that with probability at least 1 — e2t-i. 



«i.*.-.[-(--i)*=v.]< 



V + {92i-i - mf 



^21-1 



V + (92t-i - m) 



(9.4) 



Taking the union bound of inequalities (9.3 page 38) and (9.4 page 39), we see 
that with probability at least 1 — 2t2i-i, 

This can be rewritten as 



V + i92i-i - m) 



log [v + 



52i-iexp(-C2i-i; 



'3i_„.._.[-(--i)*i.-.] 



Therefore, coming back to the definition of ^2* i' we see that, with probability at 
least 1 - 2e2i-i, 

52i-iexp(-C2i-i) 



□ 



Lemma 9.5 With probability at least 1 — 2e2i, 92% = 9a.2,{92i~i)- 
Proof. We start from the exponential moment inequality 



Ejexp \na9a{9)\ } < exp [na5'(6')] , where g{9) = m + ^ [f + (m — 6')^] 
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Proofs 



Integrating with respect to 7^21, we get, choosing the parameter a to be a2i, de- 
pending on S2i through 



E|/p2j(c?5'2j)exp 



na2i\9a^X^2i-l) - g{d2i-i 
dp2i 

d7T2i 



> 



X exp 



na2i 



[Oa2, ( 



— g[ti2i-i 



l2i 



log 



dp 



2i 



d7l2i 



< E<^ / 7r2i{dS2i) exp 



na2i [0a2^{{0)2i-l) - 9{62i-i)] 
= J 7r2i(rfS'2i)E|exp na2i [^02,(^2^-1) - g(d2i-i 
Thus, according to Chebyshev's inequality, with probability at least 1 — 62 



< 1. 



72» + log(e2o 
na2i 



< m + exp 



^1 + X2i)(2i-1 



2^2* [72i + log(e 



2i 



n 



m + (2i- 



In the same way, considering 621-1 — instead of Fj — 621-1, we can prove with 
probability at least 1 — 62^ that 

m<6^^X02^-l)+C2^■ 

A union bound argument then proves that with probability at least 1 — 2e2i, 

Coming back to the definition of 621, we see that it means that with probability at 
least 1 - 2e2i, 62^ = ^a2,(^2i-i)- □ 



We can now take a union bound of Lemma 9.4 (page 37 1 and Lemma 9.5 
(page 39), for i = 1, ... , k, to see that with probability at least 1 — 2 Ylfti the 



constructions of and 6i coincide with the definitions of and 6i, and therefore 
that 6i = 6i and gj = gi, i = 1, . . . , 2k. Consequently, with probability at least 
1-2 for any i = 1, . . . , A;, 

\'m - 62i\ < C2i, 
|log[v + (m - 62i-if] - log(g2i-i) | < (22-1, 



proving Proposition 5.2 (page 13) 
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9.10. Proof of Proposition 7.1 (page 20). Let us consider the function 

g{x) = X - - log I ^ ^ I , X G R. 



Plot of X ^ g{x) (with a zoom near the origin) 
T — I — i—pn — I — I — |— I — I — r-| — I — I — r-| — n — i — | — i — i — i — | — i — n — | — i — r-r-| — i — i — i — | i i j,. 

o Zoom 



1 1 1 1 1 1 1 1 1 

g(x) zoomed 


1 1 1 1 1 1 1 1 1 


1 1 1 1 1 1 1 1 1 


1 1 1 1 1 1 1 1 1 



-1.0 



0.0 



0.5 



1.0 

I ... I 



_l I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I L 

-4 -3 -2 -1 1 2 3 4 5 



Lemma 9 . 6 The function g is bounded by 

\g{x)\ < min i -7^, \x\ 



The derivative of g is 

.2 / 1 



X 



9 [X) 



+ 



5 ' 10 



1 



X G R. 



x\2 + x^) 



4 \ 1 + X+ f 1 -x+ f 



> 0, X G R, 



4 + x4 

showing that g has the same sign as x. The fact that \g{x) \ < \x\is then clear from 



the sign of ^ log 



which is the same as the sign of x. 



Let us prove now that \g{x) \ < . It is clearly enough to prove it for x > 0, 

because \g\ is symmetric. Let us consider h(x) = g{x) — — and let us compute 



h'{x) 



X / 2 2 34 

7 ha; X 

4 + xH 5 5 



x^(x^-l)(3x^-2) 
5(4 + x4) ' 
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Proofs 



From the sign of h' , we see that h has a unique local maximum on the positive 
real line at point x — 1. Moreover h{l) — i — \ log(5) < (it is close to 
—0.005, as can be checked numerically). Thus h{x) < for x e R+, implying 

I 1 3 

that \g{x) I < — — on the whole real line, as announced. 

o 

To prove \g{x)\ < we consider h2{x) = g{x) — —. A small computa- 
tion shows that 

, x{x - l){x - 2){3x'^ + Ax + 6) 

^2^"^^ " 5(4 + a;4) " 

Thus it has a unique local maximum on the positive real line at point 2. Moreover 

4 1 3x'^ 
^2(2) = - — - log (5) < 0, showing that \g\ is upper-bounded by — — on the 

5 2 10 
positive real Une, and therefore on the whole real line because it is symmetric. 

Let us remark now that with probability at least 1 — e. 



{M — m) — ^ g \_a{Yi — m)] 

i=l 

n 2 

<5^1og[l + a(F,-m) + ^(r,-m)2] < "^-v + \og{e-^). 



The first of these two inequalities comes from the fact that 

+ <log(l + x+f), xeR. 
In the same way, with probability at least 1 — e, 

n 

na{M — m) — g [cK{Yi — m)] 
1=1 

> _ ^ log [1 - a{Y, -m) + ^{Y,- m)^] > -"^v - log(e-^). 

i=l 

Let us now deal with Yl^=i 9 {p^^Xi ~ ^)\ ■ We need some compact notations to 
manipulate this. Let Gi — g \pi(Yi — m)] and G — g \_a(W — m)] . Let us remark 
that ^ ^ 

IGI < min|^|r-m|^, ^(F-m)^,Q;|y-m| 

I O iU 
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Moreover, using the fact that min{a, b} < a^/^h^^'^, we see that 



|G|<(Ay'^4/3|y_^|4/3_ 



With probability at least 1 — e, 



1 " 

i=l 



1 " 

i=l 



1/4 



(en3)-'/'J 3(n - 1) [E(G^) - E(G)2]' + e| [G - E(G)]'} 



1/4 



(erz^)"'/^|3(n - 1) \^{G^f - 2E(G2)E(G)2 + E(G) 

+ E(G'^) - A'E{G^)¥.{G) + 6E(G2)E(G)2 - 3E(G)' ' 
= [trt'y^'^l^in - 1)E(G2)2 - 6(n - 2)E(G2)E(G)=^ 

+ 3(n - 2)E(G')^ + E(G^) - 4E(G=')E(G) ' 

< [en^^'^l 3(n - 1)E(G2)2 + E(G^) + 4E(|GnE(|G|) 



1/4 



< (en^) 3(n-l) ^^a^W.[{Y - mf] ' + a^¥.[{Y - mf] 



1 2 



+ |a^E[(y-m)^]E[|F-m|=^] 
Let us now use the fact that E [(F — m)'^] < /tt>^ to deduce that 



1/4 



/21og(e-i) 

Let us set a = \l , for this value of a 
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Proofs 



n 



n . 

1=1 



104 ^ ' 25 

1/4 



Let us remark also that 

|E(G)| < ^E(|r - < ^^M^:^. 
5 5n 

Putting all this together, we see that with probabihty at least 1 — 3e, 



\M -m\ ^ 2 log(e-i) ^ 2 \og{e-^)^/K 



n bn 
^ '^Yi + ^\n-l)\og{t-^fK ^ 12v^log(6-^)3/2y^' 



em? 



2500n2 25n3/2 



The result stated in Proposition 7.1 (page 20) is then obtained by replacing e with 
|e, to get an event with confidence level 1 — 2e as elsewhere in this paper. 

Let us remark that the following proposition, based on the Chebyshev inequal- 
ity applied directly to the fourth moment of the empirical mean does not provide 
the right speed when e is small and n large. 

Proposition 9.7 For any probability distribution whose kurtosis is not greater 
than K, the empirical mean M is such that with probability at least 1 — 2e, 



\M -m\ < 



?)(n — 1) + k\ Iv 



2ne j V n 



Proof. Let us assume to simplify notations and without loss of generality 
that E(r) = 0. 



i=l i<j 



It implies that 

Fi^\M -m\ >r]j< \ ' < ^ ^ 



7]'* rfirj'^ 



\?,{n-l) + KW 
and the result is proved by considering 2e = — — . □ 
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9.11. Proof of Proposition 8.1 (page 20). Let us consider the distribu- 
tions Pi and P2 of the sample (iTJlLi obtained when the marginal distributions 
are respectively the Gaussian measure with variance v and mean mi = — and 
the Gaussian measure with variance v and mean m2 = rj. We see that, whatever 
the estimator 6, 

Pi(^> nil + T]) + V2{e < 1712 - r]) = Pi(^> 0) + P2(^< 0) 

> (Pi A P2)(^> 0) + (Pi A P2)(^ < 0) = |Pi A P2I, 

where Pi A P2 is the measure whose density with respect to the Lebesgue measure 
(or equivalently with respect to any dominating measure, such as Pi + P2) is the 
minimum of the densities of Pi and P2 and whose total variation is |Pi A P2I. 

Now, using the fact that the empirical mean is a sufficient statistics of the 
Gaussian shift model, it is easy to realize that 

|Pi A P2I = Pi(M >mi+r]) + P2(M < m2 - rj), 

which obviously proves the proposition. 



9.12. Proof OF Proposition 8.2 (page 22). Let us consider the distribution 



with support {—nri,0,nri} defined by 

P(W) =P({-nr/}) = [l-P({0})]/2 

It satisfies E{Y) = 0, E(r2) = y and 

P(M >ri) = P(M < -T]) > P(M = r]) = 

2m]^ 



1 - 



n-l 



9.13. Proof OF Proposition 



8.3 



(PAGE 23 ). Let us consider for Y the fol- 



lowing distribution, with support {—nrj, — ^, ^, nr]}, where ^ and t] are two positive 
real parameters, to be adjusted to obtain the desired variance and kurtosis. 



P(y = -nri) = F{Y = nr]) = q, 

p(r = -0 = p(r = = ^ - g. 



In this case 



m = 0, 
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Proofs 



E{Y^) = v = {l- 2g)^2 + 2gnV, 
E(F=^) = 0, 

E(r^) = (1 - 2q)C' + 2gnV- 

Let us choose ^ such that v = 1. This is done by putting 

2 1 — 2qn'^7]'^ 
^ ~ l-2q 

The kurtosis of the distribution defined by q and rj, the two remaining free 
parameters once ^ has been set as explained, is equal to 

^ ^ 1 — 2g 

It is easily seen that 

P(M > 77) = P(M < -77) > nq^^—^ = e. 

Indeed, 

i=l ^ jj^i 

= nF{Y, = nr7)P ("f,- G {-C, +^1, J = 2, . . . , n; F,- > o] 

V j=2 ' 

> yP(>S- e {-C, +0, J = 2, . . . ,n) = f (1 - 2qr~\ 

Starting from e < (4e)~^, and c > 1 + we can define a probability 
distribution by choosing 

2e/. 4ee\"^""^^ 2ee 1 



1 < — < 



n 2n 

(n-l)/4 

> 



1/4 i\l/4/ ^_\(«-l)/4 ^ 



_ Ac-iy^-_/c-iY^Yi ^"^^ 

whose kurtosis k will not be greater than c, since in this case 

K = (1 - 2gn^^^)' + 2gn^r7^ < 1 + 2gn^77^ < c, 
1 — 2g 

and for which 

/ \ / 4ee\"~^ 

P(^|M -m| > T^j > nq{l - 2g)"-^ > - — j = 2e. 
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10. Generalizations 



10.1. NON IDENTICALLY DISTRIBUTED INDEPENDENT RANDOM VARIABLES. 
The assumption that the sample is identically distributed can be dropped. Indeed, 
assuming only that the random variables are independent, meaning that 

their joint distribution is of the product form (3)"=! we can still write, for 

Wi = ±a{Y,-d)orWi = ±[a{Yi-e)-6], 



' n 



i=l 



l + E(Vr,) + 



< exp < n log 



i=l 1=1 ^ 



Starting from these exponential inequalities, we can reach the same conclu- 
sions as in the i.i.d. case, as long as we set 



n ^-^ 

1 " 

and t; = -^E[(Fi -m)2], 



Thus here, the role that is played by the marginal sample distribution in the i.i.d. 

1 " 

case is played by the mean marginal sample distribution — > Pj. As moreover, 
the empirical mean M still satisfies 



V 

n 



E[(M-mr]=^-l5^[E(r.)-m]^< 
we see that Propositions |1.1| (page |4]), |1.2| (page [5]), |2.1| (page [5]), |3.1| (page |7]), 



4.3 (page 10), and|5.2|(page[T3|) remain true, the proofs being unchanged, except 



for the starting inequalities mentioned above, and the kurtosis coefficients being 

1 " 

those of the mean sample distribution — >^ Pj. 
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Generalizations 



10.2. Simpler truncating function. We can use the simpler truncation 
function 

L{x) = max| — 1, min{+l, 
Let A be the positive root of the equation 

~A^°^0~4[exp(A)-l-A]) " ^' 
Let us define the upper and lower bounds 

L+{x) = ^logjl + Ax+ [exp(A) - 1 - Ajx^j, 

^'^^^^ = bg(2) ^ ^^^^'^"^ ^"^^ " " ^]^' }' 

L_(x) = -L+(-x), 
L'_{x) = -L'+(-x). 

aA^ 

Numerically, 0.535 < A < 0.536. Moreover exp(A) — 1 — A = with 
2 rexp(A) - 1 - A] 
A^ 

Lemma 10.1 TTjey are such that 

L_{x) < L{x) < L+{x), xeR, (10.1) 
L'_{x) < L{x) < L\{x), X e R. (10.2) 

Proof. Let us consider the function f{x) — exp[AL+(x)] — exp[AL(x)]. It is 
such that 

aX^x"^ 

f{x)^l + Xx+— exp[AL(x)], 



{A + aA^x — Aexp(Ax), xe) — 1,+1(, 
A + aA^x, x^(-l,+l), 

A^ [1 - cxp(Aa;)] , xG)-1,+1(, 
A^a, a;^(-l,+l). 



Since /'(O) = and f"{x) > 0, -1< x < i^^, f\x) < 0, < x < 1, 

and /(I) = 0, we see that f{x) > x, x e (—1, +1). Moreover, / is quadratic 
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Simpler truncating function 



on ) — oo, — 1) and reach its minimum at point x — thus it is non nega- 

tive on the whole Une when this minimum value is non negative, that is when 

— exp(— A) > 0, which is satisfied according to the definition of A. This 



1 



1 

2a 



proves that L(x) < L_^_(x), a; e R. The fact that L_(x) < L(x) is then a con- 
sequence of L{x) = —L{—x). The proof of L{x) < L'^{x) is done similarly by 
analyzing the shape of the function x i— > exp[log(2)L'_,_(a;)] — exp[log(2)L(a;)] . 
□ 



exp[Ai+(i 
exp [XL'_^ 



fc)] - exp[AZ/(a;)] 
- exp[A-L(a 




-1 

Plot of X ^ L{x) 



1 1 1 1 1 1 1 

; — L{x) 

— L+ix) 
- -- LV(x) 

— L_(x) 
. --- L>_ix) 


1 1 1 1 1 1 . 


"U," — rfj*"-^* 


1 1 1 1 1 1 1 
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Generalizations 



If we redefine now our truncated mean estimate as 



A 



i=l 



2rexp(A) - 1 - A] 

and define a = — — ~ 1.2, we deduce that 

A^ 



na[9a{0o) — 9o\ | < exp|?2log 1 + a(m — 6*0) 



E<^exp 



aa 



[v + im-eo)']]}. 



Therefore, with probability at least 1 — e. 



eM <eo + - logjl + a{m -eo) + ^[v + {m- 60)'] I + ^ 
a I 2 J no 



na 

aa \v + {m- 60)'^] log(e~ 

< m H H 

2 na 



Working out the reverse inequality in the same way gives the following variant of 
Proposition |1.2| (page |5]) . 



Proposition 10.2 Assume thatv < vq and \m — 6q\ < 60, where Vq and 60 are 
known prior bounds. With probability at least 1 — 2e, 



na 



, 21og(e-i) 

When a = \ I — fj^- 8^^ ^'^^ probability at least 1 — 2e, 



' V n V n 



So there is a ten per cent loss of accuracy with respect to Propo sition 1 1 . 2 1 (page [5|) : 
this is the price to pay for using a simpler truncation function. We let the reader 
derive by himself the equivalent of the iterated estimate of Proposition |2.1 (page 
|5]). When there is a third moment, we can also work with Equation ( 10. 2[ page 



, instead of Equation ( 10.1 page 48 1 
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11. Some concluding remarks 

We would like to end this paper by sharing some guess about what is going on 
behind the scene. The need for thresholding indicates that large values may not 
be reliable, and have, so to speak, a bad "signal to noise ratio". This is somehow 
understandable, since values whose deviation from the mean is much larger than 
the standard deviation have to appear in the sample with a small and therefore hard 
to estimate probability whereas their large size gives them a strong impact on the 
empirical mean, which makes their contributions to this estimate even worse. 
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